Skip to content

Jinja + MarkupSafe adoption for AnnData._repr_html_#9

Open
katosh wants to merge 24 commits into
html_repfrom
jinja-markup-poc-2
Open

Jinja + MarkupSafe adoption for AnnData._repr_html_#9
katosh wants to merge 24 commits into
html_repfrom
jinja-markup-poc-2

Conversation

@katosh
Copy link
Copy Markdown
Collaborator

@katosh katosh commented Apr 21, 2026

Jinja + MarkupSafe adoption for AnnData._repr_html_

Rebuilds the HTML-repr's rendering layer on top of Jinja2 + MarkupSafe while preserving every feature of the original implementation. There is no user-facing behavior change — all 26 scenarios in the visual inspection harness render identically modulo random-data noise.

Open this first — it's the fastest way to confirm the migration preserves every repr feature (nested AnnData, Raw, README modal, SVG tree, TreeData / SpatialData ecosystem examples, no-CSS / no-JS fallbacks, and the adversarial "Evil AnnData" case).

Visual inspection preview: https://gistpreview.github.io/?d4241e6eadd2bfd211f5dca90e2403bc

Why

The original repr assembled HTML via f-strings and a convention that "if your variable is user data, you must remember to call escape_html() before interpolating." That discipline worked but:

  • left the safety invariant to reviewer diligence,
  • made the data / presentation split implicit, and
  • mixed template logic with Python control flow.

@flying-sheep's review of #2236 argued — correctly — that Jinja's autoescape-by-default and MarkupSafe's typed trust boundary give those guarantees for free. This PR adopts that position.

What changed, in two sentences

  1. Rendering pipeline: Python code now builds structured context dicts and feeds them to Jinja templates. Autoescape is on (select_autoescape(default=True, default_for_string=True)); every plain str is escaped at interpolation, every markupsafe.Markup passes through verbatim.
  2. Type contract: FormattedOutput fields carrying HTML are typed Markup | None (renamed from *_html to *_markup — the fields are Markup, not raw strings). Extension packages producing HTML must wrap at the formatter boundary: FormattedOutput(preview_markup=Markup(obj._repr_html_())).

Architecture at a glance

anndata/_repr/
├── environment.py      Jinja Environment + autoescape + NUL-scrub finalize hook
├── templates/
│   ├── anndata.j2      outer repr frame
│   ├── section.j2      <details>/<summary> section frame (reused by error/unknown)
│   ├── entry.j2        entry row (name/type/preview cells + expandable variant)
│   ├── _macros.j2      reusable macros (badge, copy_button, muted_span, …)
│   ├── header.j2       top header (type + shape + badges + search)
│   ├── footer.j2       version + memory
│   ├── index_preview.j2 obs_names / var_names preview
│   ├── hints.j2        no-CSS / no-JS hint block
│   ├── error_entry.j2  section-level error placeholder
│   ├── raw_section.j2  Raw single-row section frame
│   ├── raw_repr.j2     inner body of an expanded Raw row
│   ├── x_entry.j2      X attribute row
│   └── max_depth_indicator.j2  depth-limit placeholder
├── registry.py         FormattedOutput (now with Markup | None fields)
├── components.py       thin Python wrappers that call macros
├── formatters.py       built-in TypeFormatters (return FormattedOutput)
├── sections.py         per-section renderers (obs, var, uns, obsm, varm, raw, unknown)
├── core.py             render_section, render_formatted_entry, render_x_entry
├── html.py             generate_repr_html (orchestrator)
└── utils.py            format_number, format_memory_size, format_index_preview, …

Every rendering entry point returns Markup. Internal composition uses Markup("\n").join(...) / Markup.format() or macro calls — never raw str.

Public API

FormattedOutput — renamed fields

Before (raw str) After (Markup | None) Why
preview_html preview_markup Type annotation actually matches the contract
type_html type_markup "
expanded_html expanded_markup "

Plain-text siblings (type_name, preview, tooltip, error) keep their names — they remain autoescaped str.

Ecosystem formatters

Register a TypeFormatter or SectionFormatter exactly as before. The only authoring change: wrap your HTML output in Markup(...) at the boundary:

from markupsafe import Markup
from anndata._repr import register_formatter, TypeFormatter, FormattedOutput

@register_formatter
class MyArrayFormatter(TypeFormatter):
    def can_format(self, obj, context):
        return isinstance(obj, MyArrayType)

    def format(self, obj, context):
        return FormattedOutput(
            type_name=f"MyArray {obj.shape}",
            css_class="anndata-dtype--myarray",
            # Build custom HTML with Markup.format: each non-Markup arg is
            # autoescaped by MarkupSafe. Never use f-string interpolation
            # inside Markup(...) — that bypasses autoescape.
            preview_markup=Markup(
                '<span class="anndata-text--muted">({} items)</span>'
            ).format(obj.n_items),
        )

If the preview doesn't need custom HTML, prefer plain text (autoescaped end-to-end):

preview=f"({obj.n_items} items)"

If your extension already has an _repr_html_() from another package, wrap it at the boundary:

preview_markup=Markup(obj._repr_html_())

Three valid idioms, in order of preference:

  1. preview=<str> — plain text, autoescaped. Simplest.
  2. preview_markup=Markup('<tag>{}</tag>').format(value) — standard MarkupSafe pattern. Use when you need custom HTML.
  3. preview_markup=get_macros().<macro>(value) — invoke a Jinja macro directly; benefits from the template engine's NUL-scrub finalize hook.

Never write Markup(f'...{value}...') — the f-string interpolates before Markup sees it, bypassing autoescape.

Helper modules

All render_* helpers (badges, entry cells, section scaffold, nested content) return Markup. All macros in _macros.j2 are callable from Python via get_macros() (see environment.py) for extension packages that want to compose repr pieces without going through the templates directly.

Safety posture

What's enforced

  • Autoescape on every template interpolation — any str passed into {{ … }} gets HTML-escaped. The only way untrusted data can appear verbatim is if it's wrapped as Markup, which is an explicit trust claim.
  • TestMarkupAutoescapeContract (new, in tests/repr/test_repr_robustness.py) verifies the three *_markup fields correctly escape bare-str contract violations and pass Markup-typed values through verbatim.
  • TestEscapingCoverage (pre-existing) verifies every user-data insertion point (obs / var / uns keys and values, category values, DataFrame columns, README content) gets HTML-escaped.
  • _container_id is validated against ^[A-Za-z][A-Za-z0-9_-]*$ before being interpolated into the <script> block — closes the defense-in-depth gap that the auto-generated UUID path avoided.

Known MarkupSafe caveat: NUL bytes

MarkupSafe's escape() / Markup.format() intentionally do not scrub NUL bytes (\x00). NUL is not HTML-significant per the HTML5 spec, but it can truncate attribute values in some parsers and generally produces ugly output.

How anndata handles it:

  1. Jinja template path (the bulk of rendering): a finalize hook on the Environment replaces NULs in every str interpolation with U+FFFD. Every template-rendered attribute and text node is NUL-safe.
  2. Macro path for extension authors: get_macros().<macro>(user_data) renders through the same engine, so it picks up the finalize hook automatically — this is the recommended pattern when custom HTML must embed user data.
  3. Targeted internal sites: format_index_preview() and _build_readme_icon() (the only Python-side sites that touch potentially-NUL user data) scrub explicitly via .replace("\x00", "\ufffd").

We deliberately did not introduce a custom safe_format helper. The MarkupSafe API is the common vocabulary in the Python ecosystem; shadowing it with a bespoke wrapper would add something to learn for marginal benefit. Extension authors needing NUL safety for custom HTML can either call a macro via get_macros() or scrub inline — both are documented. The trust boundary in FormattedOutput.*_markup is about XSS (handled by autoescape), not NUL hygiene.

Dependencies

Adds two direct runtime dependencies (both MIT-licensed, widely packaged):

  • jinja2>=3.1
  • markupsafe>=3.0

Jinja depends on MarkupSafe, so in practice only one new install. No other dependency changes.

Metrics

  • HTML tags in Python (excluding docstring examples and the <pre> repr_html_enabled=False fallback):
    • sections.py: 41 → ~1
    • html.py: 22 → ~1 (the <pre> fallback)
    • components.py: 35 → ~0
    • core.py: 11 → ~0
  • escape_html call sites: 78 → 0 (helper removed entirely).
  • Templates: 0 → 13.

What this PR does not do

  • Does not add new user-facing features. Every displayed element, badge, truncation, color dot, SVG tree, nested AnnData view, README modal, search box, keyboard shortcut, and CSS rule is preserved.
  • Does not change the repr's CSS or JavaScript. Those files are untouched; only how the HTML around them is assembled has changed.
  • Does not require any ecosystem package to update unless they were using preview_html / type_html / expanded_html directly — in which case the field renames (*_markup) are the only change, and their HTML must now be Markup-wrapped at the boundary.

katosh added 24 commits April 20, 2026 14:33
Routes the top-level repr through a single autoescape-enabled Jinja template
and wraps existing formatter-produced HTML fragments in markupsafe.Markup at
the boundary. Formatter internals (formatters.py, registry.py, components.py,
sections.py, core.py) are untouched.

The safety contract at the outer template:
- plain-str values (container_id, depth, style) are autoescaped by default
- Markup-wrapped fragments (header, sections, css, js, hints) pass through

Adds jinja2>=3.1 and markupsafe>=3.0 to dependencies. Adds a minimal
Environment module and one outer anndata.j2 template.

The existing tests/visual_inspect_repr_html.py visual harness runs cleanly
against this branch and produces the full 26-scenario comparison artifact.
Repr test suite: 614 passed, 1 skipped — zero regressions.
Replace the f-string-assembled section frame in ``core.py`` with
``templates/section.j2``. One template covers both the normal
``<details>``-with-entries shape and the empty-state placeholder.

- ``render_section`` now renders through the template and returns ``Markup``.
- ``render_empty_section`` delegates to ``render_section(n_items=0, …)``.
- ``render_truncation_indicator`` now returns ``Markup``.
- Constants used inside templates (``CSS_TEXT_ERROR``, ``CSS_TEXT_MUTED``,
  ``NOT_SERIALIZABLE_MSG``, ``STYLE_HIDDEN``) are exposed as environment
  globals so templates can reference them symbolically.

Transition: ``render_section(entries_html=…)`` accepts ``str`` or
``Markup``. Bare ``str`` is wrapped in ``Markup`` at the boundary so
existing callers that still produce raw HTML fragments are preserved.
Entry-level rendering will migrate in a follow-up; at that point
internal callers will pass ``Markup`` directly.
Adds templates/_macros.j2 (badge, copy_button, muted_span, warning_icon,
wrap_button) and templates/entry.j2 for one-shot row rendering.

render_formatted_entry now returns Markup via entry.j2 instead of
assembling sub-cells in Python. Small component helpers (render_badge,
render_copy_button, render_muted_span, render_warning_icon, wrap-button
helpers) delegate to the new macros and return Markup.
render_header_badges composes its parts via Markup.join.

environment.get_env() now uses a finalize callback to scrub NUL bytes
from str values before autoescape, preserving the scrubbing previously
done in utils.escape_html on the Python render path.

Public API is stable; return types tightened from str to Markup
(Markup subclasses str, so callers using string ops still work).
Section-level HTML assemblers (_render_unknown_sections,
_render_error_entry, _render_raw_section, and the per-attr section
renderers) now return Markup. Internal "\n".join(parts) stays but
the result is wrapped at the return boundary. Return type annotations
updated from str to Markup.

No behavioral change — existing escape_html discipline is preserved;
the Markup wrap makes the trust claim explicit and unblocks the phase-C
finalization that removes the transitional str→Markup wrap in
render_formatted_entry.
TypeFormatters that produce preview_html / expanded_html / type_html
now return Markup at construction (not str). The dataclass types were
already tightened in phase B; this tightens the values.

No behavioral change — each fragment already used escape_html on user
data and is safe HTML; the Markup wrap makes the trust claim explicit
and allows phase C finalization to remove the transitional
str→Markup wrap currently applied in render_formatted_entry.
…phase C)

FormattedOutput.preview_html / type_html / expanded_html are now typed
``Markup | None`` (was ``str | None``). Component helpers that were still
returning ``str`` — render_entry_row_open, render_search_box,
render_nested_content, render_name_cell, render_entry_type_cell,
render_entry_preview_cell — now return ``Markup``. Their composition sites
use ``Markup("").join(...)`` so the return value is a real ``Markup``
instance rather than a plain string that happens to contain HTML.

TypeCellConfig.type_html is also typed ``Markup | None``.

No behavioral change: every site already used escape_html() around user
data. Tightening the types makes the trust boundary enforceable at the
annotation level and removes the ambiguity of "is this str safe HTML or
not?" Paired with the formatter/section wraps in 95ddde9 and 9a7f273,
this lets the next commit remove the transitional str→Markup wrap in
render_formatted_entry.
…tion)

Phase B added an implicit Markup wrap in render_formatted_entry so that
formatters still returning str preview_html / type_html / expanded_html
could flow through entry.j2 without being autoescaped. With all internal
formatters (95ddde9), sections (9a7f273), and components (42f655f)
now returning Markup, the wrap is dead weight — remove it.

Also wraps the two fallback-formatter preview_html f-strings in
registry.py in Markup(), and updates the module docstring example to
show the Markup(...) idiom at the formatter boundary.

FormattedOutput.preview_html / type_html / expanded_html are now
Markup-typed end-to-end: every internal producer returns Markup, the
dataclass stores Markup, and entry.j2 passes it through autoescape
verbatim. Extension packages producing HTML must now wrap at the
formatter boundary (documented in the module docstring).
Post-phase-C, FormattedOutput.preview_html / expanded_html are typed
Markup | None and entry.j2 autoescapes bare str. The ecosystem
examples in tests/visual_inspect_repr_html.py were still passing raw
str for:

- TreeData ObstSectionFormatter / VartSectionFormatter: expanded_html
  was the SVG tree string from _render_tree_svg
- TreeMetadataSectionFormatter.render_html: returned raw str
- MuData ModSectionFormatter: expanded_html was generate_repr_html output
- SpatialData: preview_html for images / labels / points / shapes,
  expanded_html for the nested AnnData in tables
- Uns TypeFormatter custom preview
- Ontology extensibility TypeFormatter preview_html

Each site now wraps its HTML in markupsafe.Markup so the trust claim
is explicit — matches the public contract now that the type tightening
is end-to-end.

Bug this fixes: "gene_ontology DiGraph (54 nodes, 45 leaves)" placeholder
and the nested AnnData inside SpatialData tables were being rendered as
escaped HTML source text instead of their intended markup.
The Jinja migration typed these fields as ``markupsafe.Markup | None``;
the ``_html`` suffix was a leftover from the plain-``str`` era and
misdescribed their contract — a bare string flowing into a field named
``preview_html`` looked fine to ecosystem authors but was silently
autoescaped at the template boundary.

Renames (on FormattedOutput, TypeCellConfig, entry.j2, and every call
site and docstring):

- preview_html       → preview_markup
- type_html          → type_markup
- expanded_html      → expanded_markup
- append_type_html   → append_type_markup  (bool flag that mirrors
  the renamed field)
- index_preview_html (local in html.py) → index_preview_markup

The plain-text siblings (``preview``, ``type_name``, ``tooltip``)
keep their names — the bare-name / ``_markup``-suffix pair now cleanly
reflects the type contract: autoescaped str vs trusted Markup.

Also wraps every bare-string assignment to the renamed fields in
``Markup(...)``: four docstring examples in __init__.py / registry.py /
core.py, and three test assignments in test_repr_registry.py /
test_repr_formatters.py that were previously passing through the
autoescape path and being rendered as escaped HTML text.

No backwards-compat shims: nothing has been released.
Stragglers from the phase-C review that kept a few str return types and
transitional wraps alive:

- ``core.py::render_x_entry`` → returns ``Markup`` (was ``str``); parts
  list is typed ``list[Markup]`` and joined with ``Markup("\n").join``.
- ``html.py``: ``_render_header`` / ``_render_footer`` /
  ``_render_index_preview`` / ``_render_max_depth_indicator`` →
  ``Markup``. Their callers in ``generate_repr_html`` drop the
  transitional ``Markup(...)`` wraps.
- ``html.py::_render_all_sections`` → ``list[Markup]``; drop the
  ``[Markup(s) for s in …]`` comprehension at the caller.
- ``html.py::_render_section`` / ``_render_custom_section`` → ``Markup``
  (the latter wraps ``formatter.render_html(…)`` output so extension
  packages can still return plain ``str``).
- ``html.py::generate_repr_html`` → ``Markup``. This removes a redundant
  ``Markup(Markup(...))`` double-wrap in ``AnnDataFormatter``'s nested
  repr construction.
- ``formatters.py::AnnDataFormatter.format`` — drop the inner redundant
  ``Markup(...)`` now that ``generate_repr_html`` returns ``Markup``.
- ``_render_footer`` uses ``Markup('<tag>{}</tag>').format(value)``
  for safe plain-text interpolations (version string, memory size)
  instead of redundant ``escape_html`` on strings that can't contain
  HTML chars.
- ``html.py``: moved the side-effect ``from . import formatters``
  next to the other first-party imports (was below ``TYPE_CHECKING``).

Also wraps two bare-string assignments surfaced by the rename:
- ``tests/repr/test_repr_registry.py:550`` — ``preview_markup=f'…'``
  → ``Markup(f'…')``
- ``tests/repr/test_repr_formatters.py:841`` — ``expanded_markup=tree_html``
  where ``tree_html`` was a bare triple-quoted string → ``Markup(...)``

Drops "POC" / "middle-ground" language from ``anndata.j2`` and
``core.py`` now that the migration has landed.
…h B)

Internal callers in sections.py (3 sites) and html.py (1 site) switch
from ``"\n".join(rows)`` to ``Markup("\n").join(rows)``. With every
caller now producing ``Markup``, ``render_section``'s signature
tightens to ``entries: Markup`` (was ``str | Markup``) and the
transitional implicit-wrap comment in ``core.py:108-111`` is gone.

``render_empty_section`` feeds ``Markup("")`` for the same reason.
Docstring examples in ``core.py`` and ``__init__.py`` updated to show
the new idiom.
…call sites

Replace the `Markup(f'...{escape_html(x)}...')` pattern with the idiomatic
`Markup('...{}...').format(x)` — MarkupSafe's `.format()` autoescapes non-Markup
args, so the manual escape_html wrapping is redundant. Treating the template
string as trusted HTML and letting .format() escape user data is less error
prone and removes a hand-rolled escape boundary at every call site.

Migrated ~27 call sites across:
- components.py: row_open, search_box, name_cell, category_list, type_cell
- core.py: x_entry (error + type), formatted_entry error preview
- formatters.py: DataFrame columns preview, color swatches
- html.py: header type/filepath/lazy filepath, README icon, disabled fallback
- registry.py: unknown-type error and warning previews
- sections.py: unknown sections type cell, error entry

`escape_html` definition retained in utils.py (remains exported from
anndata._repr as a public helper). Usage in src/anndata/_repr/ drops from
27 call sites to 0 (outside utils.py's own internal use).

Side improvements along the way:
- registry.py: fixed a latent bug where the unknown-type warning preview
  built a plain str instead of a Markup (`preview_markup = f'...'`). Now a
  Markup, matching the field's type.
- html.py: the `repr_html_enabled=False` fallback now returns a Markup
  instead of a plain str, matching the function's declared return type.

All 614 repr tests pass; pre-commit clean.
Moves all entry-row cell markup into `_macros.j2` so the Jinja templates
are the single source of truth. Python helpers in `components.py` and
`core.py` become thin wrappers that call the macros via `.module`, so
the public signatures and return types (Markup) don't change.

Macros added to `_macros.j2`:
- `name_cell(entry_key)`
- `type_cell(type_name, css_class, type_markup=None, tooltip='',
  all_warnings=None, is_not_serializable=false, has_columns_list=false,
  has_categories_list=false, append_type_markup=false)` — takes every
  input as an explicit parameter instead of reading outer template scope
- `preview_cell(preview_markup=None, preview_text=None)`
- `row_open(key, dtype, css_class, has_expandable_content=false)` —
  caller builds the space-joined class string
- `nested_content(html_content)`
- `truncation_indicator(remaining)`
- `category_list(items, total_hidden=0)` — `items` is a sequence of
  `(label, safe_color_or_none)` pairs so Python keeps ownership of
  `sanitize_css_color`; total_hidden is pre-computed by the wrapper

`entry.j2` now just imports the macros and dispatches the row open /
three cells / close — no more local macro definitions.

HTML-tag count in `components.py` drops from 29 to 10 (remaining tags
are inside `render_search_box`, which is out of scope for C1, plus
docstring examples).

Tests: 614 passed, 1 skipped; visual inspection unchanged.
…nts) (C3)

Move the remaining orchestrator-layer HTML in `_repr/html.py` into Jinja
templates. Python keeps the structural logic (badge construction, README
truncation, backing-info lookup, memory formatting); templates now own all
frame HTML.

New templates in `_repr/templates/`:
- `header.j2` — `<div class="anndata-header">` with type/shape, an ordered
  list of pre-rendered `extras` (badges + filepath spans + README icon),
  and an optional search box.
- `footer.j2` — `<div class="anndata-footer">` with version + optional
  memory string.
- `index_preview.j2` — two-line obs_names / var_names preview.
- `hints.j2` — static no-CSS / no-JS hint block.
- `max_depth_indicator.j2` — single-line depth-limit placeholder.

`_render_header`, `_render_footer`, `_render_index_preview`,
`_render_max_depth_indicator` now build a context dict and call the matching
template. A new `_render_hints()` helper replaces the inline hints Markup in
`generate_repr_html`. README-icon Markup construction is factored into
`_build_readme_icon` (kept in Python so the truncation logic doesn't leak
into templates).

Visible HTML-tag constructs in `html.py` drop from 19 to 5 (the remainder:
a `<pre>` fallback when repr is disabled, two tiny filepath `<span>`
wrappers that pair with badges, and two strings inside docstrings/comments).

Tests: `614 passed, 1 skipped` in `tests/repr/`.
Moves HTML scaffolding for three section-level renderers in sections.py
out of Python string concatenation and into Jinja templates:

- _render_unknown_sections now reuses section.j2 (extended with an
  optional extra_classes parameter) so the "other" section drops into
  the same <details>/<summary> frame as every other section while still
  getting its anndata-sec-unknown class. Per-row type/preview cells are
  built with the existing components (TypeCellConfig, render_name_cell,
  render_entry_preview_cell).
- _render_error_entry now renders through a new error_entry.j2. This
  section doesn't fit section.j2's entries-grid shape (it holds a single
  red error message, not an entries list), so a dedicated template is
  clearer than overloading section.j2.
- _render_raw_section now renders its outer frame through a new
  raw_section.j2. It's a single-row "anndata-sec" wrapper (not the
  normal anndata-section <details> frame), so it also gets its own
  template. The row itself is still assembled from components.py.
- _generate_raw_repr_html (the body rendered inside the expanded Raw
  row) now renders through a new raw_repr.j2, which mirrors
  anndata.j2's shape but drops the sections Raw doesn't have.

Also:
- core.render_section grows an extra_classes keyword used by the
  unknown-section path.
- _safe_index_preview extracted from the inline try/except ladder in
  _generate_raw_repr_html so the template just receives Markup|None.

HTML tag count in sections.py drops from 30 to effectively 1 (only
<div class="anndata-entry__nested-anndata"> remains, wrapping the
nested Raw repr inside the expandable entry; see comment below).

Tests: 614 passed, 1 skipped. No behavior change intended.
New class ``TestMarkupAutoescapeContract`` in
tests/repr/test_repr_robustness.py (4 tests):

- Three negative tests: an extension-style TypeFormatter returns
  ``FormattedOutput(preview_markup=<bare str with <script>>)`` / same
  for type_markup and expanded_markup. All three verify the script tag
  comes out as ``&lt;script&gt;...&lt;/script&gt;`` — Jinja autoescape
  catches the contract violation, so a future regression that loosens
  the type back to ``str`` (or a template change that disables
  autoescape) can't silently land.
- One positive control: a correctly-wrapped ``Markup(html)`` value
  flows through verbatim.

Release note for PR scverse#2236 updated to mention the Jinja/MarkupSafe
dependency and the ``*_markup`` field naming.
- ``section.j2`` no longer hardcodes ``'(empty)'`` when ``n_items == 0``;
  ``render_section`` sets the default ``count_str`` based on n_items so
  callers that pass an explicit ``count_str`` (e.g. ``"(5 columns)"``)
  are respected even for empty sections.
- New ``filepath_span(path, style='')`` macro in ``_macros.j2`` with a
  Python wrapper ``render_filepath_span`` in components.py. ``html.py``
  uses it for the backed/lazy filepath spans — drops the last two
  ``<span>`` f-strings in the orchestrator (html.py HTML-tag count goes
  from 5 to 2; the residual 2 are false-positives in a docstring and
  the ``<pre>`` repr_html_enabled=False fallback).
- ``format_index_preview`` now returns ``Markup`` (was ``str`` that
  happened to contain escaped HTML — a latent double-escape hazard).
  Items are joined with ``Markup(", ").join(...)`` for autoescape. The
  ``Markup(format_index_preview(...))`` re-wraps in ``_render_index_preview``
  are dropped.
- Orchestrator templates (``header.j2``, ``footer.j2``,
  ``index_preview.j2``) reindented to the 2-space convention used by
  every other block-level template; in-line/macro files in ``_macros.j2``
  and ``entry.j2`` stay compact on purpose.
… strengthen NUL test (F3)

- validate caller-supplied _container_id against ^[A-Za-z][A-Za-z0-9_-]*$; raise ValueError on violation (auto-UUID path unchanged)
- add TestMarkupAutoescapeContract.test_section_formatter_render_html_is_trusted pinning the ecosystem-extension trust contract
- strengthen test_unicode_in_readme: assert NUL not in rendered HTML
escape_html removal (14 sites):
- Definition + html import in src/anndata/_repr/utils.py
- Public export in src/anndata/_repr/__init__.py (import + __all__ entry)
- test_escape_html unit test in tests/repr/test_repr_utils.py
- escape_html assertion in tests/repr/test_repr_core.py
- Extension example in tests/test_repr.py (migrated to Markup.format())
- 8 ecosystem example sites in tests/visual_inspect_repr_html.py
  (imports, docstring, TreeMetadata/MockSpatialData/ontology formatters
  all migrated to Markup(...).format(...) pattern)

_macros() dedup:
- Moved to src/anndata/_repr/environment.py as get_macros()
- Dropped local helpers + cache/get_env imports in components.py and core.py
- Updated 14 call sites (12 in components.py, 2 in core.py)

__all__ audit (src/anndata/_repr/__init__.py):
- Removed STYLE_HIDDEN, NOT_SERIALIZABLE_MSG, DOCS_BASE_URL,
  get_section_doc_url from __all__ and their now-unused imports.
  No external references in tests/ or docs/.
Replaces scattered ``Markup('<tag>{}</tag>').format(...)`` patterns with
Jinja macros where the template owns the HTML and autoescapes the
variables. Python call sites keep a single ``Markup(_macros().<name>(...))``
wrap around trusted macro output.

Macros added to ``_macros.j2``:
- ``error_preview(message)`` / ``warning_preview(message)`` /
  ``muted_error_span(error_msg)``: preview-column status spans.
- ``columns_preview(columns)``: DataFrame column list in obsm/varm.
- ``color_swatch(color, label, valid=true)`` /
  ``color_preview(swatches, overflow_count=0)``: single swatch +
  aggregated swatch wrapper with "+N" tail.
- ``nested_anndata_wrapper(inner)``: trusted Markup wrapper for nested
  AnnData repr fragments.
- ``readme_icon(content, tooltip)``: ⓘ icon with data-readme attribute.
- ``pre_fallback(text)``: <pre> block for the HTML-disabled fallback.
- ``search_box(search_id)``: full hidden search input + toggles.

New template ``x_entry.j2`` replaces the 30-line parts-list construction in
``render_x_entry`` with a state-dispatched template (ok / none /
attribute_error / format_error).

Migrated call sites (11): core.py (x_entry, formatted_entry error preview),
registry.py (fallback error/warning previews), formatters.py (DataFrame
column preview, categorical count fallbacks, color swatches + wrapper,
nested AnnData wrapper), sections.py (raw nested wrapper, unknown sections
now route through ``render_formatted_entry``), components.py (search box),
html.py (README icon + <pre> fallback).

``environment.py`` gains CSS_COLORS, CSS_COLORS_SWATCH,
CSS_COLORS_SWATCH_INVALID, CSS_DTYPE_UNKNOWN, CSS_NESTED_ANNDATA, and
CSS_TEXT_WARNING as env globals so the macros can reference them by name.

Side effect: the README icon migration fixes a pre-existing NUL-byte leak.
The previous ``Markup(...).format(readme_content, tooltip_text)`` only
HTML-escaped; NUL bytes flowed through into the ``data-readme`` attribute.
``_build_readme_icon`` now scrubs NULs explicitly before handing off to
the macro.
…expose get_macros()

Issue surfaced by review: the docstring examples in __init__.py and
registry.py were teaching ecosystem authors the anti-pattern

    preview_markup=Markup(f'<span class="...">({obj.n_items} items)</span>')

The f-string interpolates ``obj.n_items`` before Markup sees it, bypassing
autoescape. Replaced every example (six sites across __init__.py, registry.py,
and core.py) with the correct idioms:

- preview=<str>                                      (plain text, autoescaped)
- preview_markup=Markup('<tag>{}</tag>').format(v)   (standard MarkupSafe)
- preview_markup=Markup(obj._repr_html_())           (reuse trusted HTML)
- preview_markup=get_macros().my_macro(v)            (Jinja macro path)

Exposes ``get_macros()`` from ``anndata._repr`` so the fourth idiom is
available to extension packages — the macro path is the only one that
benefits from the engine's NUL-scrub finalize hook.

Explicitly flags ``Markup(f'...{v}...')`` as the pattern to avoid in both
the registry.py module docstring and the ``TypeFormatter`` class docstring.

No behavior change. Nothing added to the public API beyond ``get_macros``
(already used internally by components.py / core.py).
str.join over a list of Markup returns plain str, which Jinja
autoescapes when interpolated by render_section(entries=...). The
result was every entry's HTML rendering as escaped text (&lt;div&gt;).

Switch to Markup("\n").join(rows) in the six render_section call
sites inside MockSpatialData so the demo teaches the right idiom.
# Conflicts:
#	src/anndata/_repr/html.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant